This article originally appeared in The Bar Examiner print edition, Spring 2022 (Vol. 91, No. 1), pp. 51–53.By Rosemary Reshetar, EdD*It’s a not uncommon question: Why are bar exam pass rates across US jurisdictions lower in February than in July? The quick answer is that, on average, the examinees taking the exam in February do not answer as many questions correctly on the Multistate Bar Examination (MBE) compared with those who take the exam in July. But how are we sure this is the main factor? How do we know it is something different about the test taker populations as opposed to variations in the difficulty or fairness of the questions? We know this because (1) the February and July bar exams are built to be maximally similar, (2) the exam scores are equated and scaled across administrations, and (3) we can see from other measures that the examinee population that takes the exam is different in February than it is in July.1
1. The tests are built to be maximally similar.
NCBE produces test questions and components used on most bar exams administered in US jurisdictions.2 The 200 multiple-choice questions that make up the MBE serve as a sound basis for equating and scaling, enabling NCBE and/or jurisdictions to perform the mathematical calculations that ensure scores on the full bar exam (MBE plus written components) are interchangeable (or have the same meaning) across administrations.
Each set of 200 MBE items is developed over a roughly two-year process, during which questions are pre-tested for clarity, fairness, and reliability.3 Each set is built to the same statistical and content specifications. Questions administered in February undergo the same rigorous review as those for July and no question begins its life as a “February” question or a “July” question. During the pretesting phase, questions do not contribute to an examinee’s score. Thus, although substantial work is done to ensure the questions are appropriate in terms of length, difficulty, and content before they appear even as a pretest item, it is only after pretest performance statistics are reviewed that an item would contribute to a person’s score. In the same vein, no question is designated as a February or July question based on its individual difficulty.
2. Exam scores are equated and scaled across administrations.
On every bar exam, a set number of “equator” MBE questions is used in what is called a “mini test,” which is embedded within the complete MBE.4 These questions appear on multiple exams over time, and their purpose is to help determine how prepared the current group of examinees is compared to prior groups—as all groups will have answered a set of the equator questions, it’s possible to make comparisons based on each group’s performance. So how do we know that performance looks stronger or weaker on one administration versus another? To simplify a complex process: we compare performance on identical, embedded, items.
In addition to the MBE, bar exams contain “constructed response” questions. For most US jurisdictions, these come via the Multistate Essay Examination (MEE) and Multistate Performance Test (MPT). For those jurisdictions for which NCBE provides scoring services (including all UBE jurisdictions), constructed response scores are scaled to MBE scores.5 Jurisdictions that develop their own essays and do not use NCBE’s scoring services may work with psychometricians on staff or in consulting roles to follow a scaling process using the jurisdiction’s materials.
Like equating, scaling ensures a test taker’s score is not impacted by when they take the exam, or which exact MEE and MPT questions are on their exam. Essays are graded on a set scale, which is typically 1–6 or 1–10 (depending on jurisdiction policies). For the following example, we’ll use the 1–6 scale. Within the exam cohort, the best MEEs and MPTs will receive 6s, the next best 5s, and downward. One can get a zero, though that score is typically reserved for answers that are either blank or effectively nonresponsive. To be clear, an essay that receives a 6 isn’t expected to be perfect—it’s simply appreciably better than the other essays in that exam’s pack. Because the scores for the essays are relative, a 6 on one exam is not strictly equal to a 6 on another.
To keep exam scoring consistent across administrations, NCBE scales essay scores to equated MBE scores.6 Each jurisdiction grades its own essays,7 so the MBE performance within that jurisdiction establishes both the average (mean) and standard deviation of the essay scores after scaling. For example, if the average MBE score within a jurisdiction is 135, the average essay score will also be scaled to 135. An individual examinee can have a high MBE and a low essay score, or vice versa. But the jurisdiction’s cohort for a particular administration will have the same average and standard deviation, and a similar range of MBE and essay scores.
What does all this mean? Ultimately, it means that it doesn’t matter when an examinee takes the exam. If they score high, they will do so on either a February or a July exam. If they struggle, this likewise will be true whenever they take the test. Equating and scaling ensure the whole examinee pool receives fair and accurate scores regardless of when they sit for the exam.8
3. The examinee population changes between February and July.
Returning to the original question: Why is the average and distribution of February MBE scores—and corresponding pass rates—lower in February? Parts 1 and 2 above helped rule out explanations based on characteristics of the exam. July exams are not built to be different from the February exams (see Part 1); in fact, they use a small number of identical items to help ensure stability in measuring performance over time (see Part 2). Having ruled out some possibilities for an explanation associated with the test, we move to looking at the examinee populations to help understand patterns in performance over time.
Historically, the composition of the examinees differs between February and July. For one thing, the population of takers is very roughly twice as large in July as in February. This difference solely in size of the population might be a hint that meaningful differences exist by group, though we would not necessarily expect that difference in population size itself to necessarily affect performance. What does reliably do so is the percentage of examinees within the population who are taking the exam for the first time versus those retaking it. On average, the former group passes at a higher rate than the latter. The numbers from 2019, the last prepandemic year, are typical: all likely first-time test takers earned an average MBE score of 143.8. Likely repeaters, however, earned an average MBE score of 132.4.9
Breaking out those numbers by administration provides additional insight. In February 2019, nearly two-thirds of examinees (approximately 62%) took the exam after not passing on one or more prior exams and earned a mean MBE score of 131.2. First-time examinees (approximately 22%) earned a mean MBE score of 135.6.
July 2019 tells a very different story: The approximately 65% of examinees who took the bar exam for the first time earned an average MBE score of 145.1.10 Likely retakers made up approximately 23% of the examinee pool and had a lower mean MBE than those who sat in February—129.1. Nevertheless, the overall numbers in July skewed higher because of the higher MBE scores (on average) earned by the larger portion of the examinee pool; they skewed lower in February because of the larger number of low-scoring repeaters.11
While the February and July exams are carefully built, equated, and scaled to ensure the exams do not present advantages or disadvantages based on when an examinee sits, the composition of the examinee pool has a significant impact on the MBE mean, and therefore pass rates.
To close, wider UBE adoption plays a part as well. The UBE confers a transferrable score, which lessens the need for successful examinees to take bar exams in multiple jurisdictions—many of whom did so in February prior to the UBE. This effect on the February examinee pool—and resulting likely drop in the mean for repeaters—will only continue as more jurisdictions join the UBE fold.12
Notes
- This article is meant to provide a brief explanation of the processes for building and scoring the bar exam. For more in-depth information, please review the articles cited below. (Go back)
- Fifty-four of the 56 US jurisdictions use the Multistate Bar Examination (MBE), 47 use part or all of the Multistate Essay Examination (MEE), and 49 use part or all of the Multistate Performance Test (MPT). As of February 2022, the 39 (soon to be 41) Uniform Bar Examination (UBE) jurisdictions use all the MBE, MEE, and MPT components. All jurisdictions grade their own MEEs and MPTs, and NCBE provides score scaling services for 40 jurisdictions; the remaining 16 jurisdictions perform their own score scaling. (Go back)
- C. Beth Hill, “MBE Test Development: How Questions Are Written, Reviewed, and Selected for Test Administrations” 84(3) The Bar Examiner (September 2015) 23–28. (Go back)
- For a particularly helpful article on the use of equators, including the simple linear equation used to make the equating calculation, see Deborah J. Harris, “Equating the Multistate Bar Examination” 72(3) The Bar Examiner (August 2003) 12–18. (Go back)
- This is broadly true of the bar exam in general, although some jurisdictions that perform their own scaling may use alternative methods. Susan M. Case, “The Testing Column: Demystifying Scaling to the MBE: How’d You Do That?” 74(2) The Bar Examiner (May 2005) 45–46. (Go back)
- Id. (Go back)
- Judith A. Gundersen, “It’s All Relative—MEE and MPT Grading, That Is” 85(2) The Bar Examiner (June 2016) 37–45. (Go back)
- It is a common misperception that an examinee cohort can pull down the scores of any one examinee, making the July exam more desirable than the February exam. (Go back)
- Examinees are categorized as “likely” first-time and repeat test-takers based on information jurisdictions provide. As not all jurisdictions have provided this information historically, NCBE acknowledges that these numbers are estimates. The difference between likely first-time and repeat test-takers is the set of examinees for which NCBE has not received jurisdiction reports. Unusually, the ratio of first-time versus repeating examinees was notably different in February 2021, apparently due to the challenges posed by the COVID-19 pandemic, but a more typical ratio returned in February 2022. See “National Mean of 132.6 for February 2022 MBE,” NCBE, April 8, 2022. (Go back)
- This is nearly 10 points higher than the average MBE earned by first-time takers who took the exam in February. Given the use of the equating questions, why the significant difference between February and July first-time takers? One likely explanation is that the February first-time examinees typically fall into three broad categories: examinees who could have taken the exam at the prior July administration, but for some reason chose not to do so; others who may have elected, where possible, to graduate a semester early; and still others who may have graduated later than their classmates. Each of these groups likely faced different challenges as they approached the February exam. (Go back)
- Although it may be cold comfort to those examinees who do not pass on their first or even second attempts, American Bar Association (ABA) data indicates that most graduates of ABA-accredited law schools—as did 91.17% of those who graduated in 2019—will pass the bar exam within two years of graduation. See “Statistics,” American Bar Association. (Go back)
- See “National Mean of 132.6 for February 2022 MBE.” (Go back)
Rosemary Reshetar, EdD, is the Director of Assessment and Research for the National Conference of Bar Examiners.
*The author thanks Joanne Kane, PhD, Associate Director of Testing for NCBE, and Andrew A. Mroch, PhD, Senior Research Psychometrician for NCBE, for their contributions to the article.
Contact us to request a pdf file of the original article as it appeared in the print edition.